22 research outputs found
Online Domain Adaptation for Multi-Object Tracking
Automatically detecting, labeling, and tracking objects in videos depends
first and foremost on accurate category-level object detectors. These might,
however, not always be available in practice, as acquiring high-quality large
scale labeled training datasets is either too costly or impractical for all
possible real-world application scenarios. A scalable solution consists in
re-using object detectors pre-trained on generic datasets. This work is the
first to investigate the problem of on-line domain adaptation of object
detectors for causal multi-object tracking (MOT). We propose to alleviate the
dataset bias by adapting detectors from category to instances, and back: (i) we
jointly learn all target models by adapting them from the pre-trained one, and
(ii) we also adapt the pre-trained model on-line. We introduce an on-line
multi-task learning algorithm to efficiently share parameters and reduce drift,
while gradually improving recall. Our approach is applicable to any linear
object detector, and we evaluate both cheap "mini-Fisher Vectors" and expensive
"off-the-shelf" ConvNet features. We quantitatively measure the benefit of our
domain adaptation strategy on the KITTI tracking benchmark and on a new dataset
(PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.Comment: To appear at BMVC 201
Combination Schemes for Turning Point Predictions
We propose new forecast combination schemes for predicting turning points of business cycles. The combination schemes deal with the forecasting performance of a given set of models and possibly providing better turning point predictions. We consider turning point predictions generated by autoregressive (AR) and Markov-Switching AR models, which are commonly used for business cycle analysis. In order to account for parameter uncertainty we consider a Bayesian approach to both estimation and prediction and compare, in terms of statistical accuracy, the individual models and the combined turning point predictions for the United States and Euro area business cycles
Segment-and-count: Vehicle Counting in Aerial Imagery using Atrous Convolutional Neural Networks
High-resolution aerial imagery can provide detailed and in some cases even real-time information about traffic related objects. Vehicle
localization and counting using aerial imagery play an important role in a broad range of applications. Recently, convolutional neural
networks (CNNs) with atrous convolution layers have shown better performance for semantic segmentation compared to conventional
convolutional aproaches. In this work, we propose a joint vehicle segmentation and counting method based on atrous convolutional
layers. This method uses a multi-task loss function to simultaneously reduce pixel-wise segmentation and vehicle counting errors. In
addition, the rectangular shapes of vehicle segmentations are refined using morphological operations. In order to evaluate the proposed
methodology, we apply it to the public "DLR 3K" benchmark dataset which contains aerial images with a ground sampling distance
of 13 cm. Results show that our proposed method reaches 81.58% mean intersection over union in vehicle segmentation and shows an
accuracy of 91.12% in vehicle counting, outperforming the baselines
End-to-End Saliency Mapping via Probability Distribution Prediction
Most saliency estimation methods aim to explicitly model
low-level conspicuity cues such as edges or blobs and may
additionally incorporate top-down cues using face or text
detection. Data-driven methods for training saliency mod-
els using eye-fixation data are increasingly popular, par-
ticularly with the introduction of large-scale datasets and
deep architectures. However, current methods in this lat-
ter paradigm use loss functions designed for classification
or regression tasks whereas saliency estimation is evalu-
ated on topographical maps. In this work, we introduce
a new saliency map model which formulates a map as a
generalized Bernoulli distribution. We then train a deep ar-
chitecture to predict such maps using novel loss functions
which pair the softmax activation function with measures
designed to compute distances between probability distri-
butions. We show in extensive experiments the effective-
ness of such loss functions over standard ones on four pub-
lic benchmark datasets, and demonstrate improved perfor-
mance over state-of-the-art saliency methods
Vision-based simultaneous localization and mapping in changing outdoor environments
For robots operating in outdoor environments, a number of factors, including weather, time of day, rough terrain, high speeds, and hardware limitations, make performing vision-based simultaneous localization and mapping with current techniques infeasible due to factors such as image blur and/or underexposure, especially on smaller platforms and low-cost hardware. In this paper, we present novel visual place-recognition and odometry techniques that address the challenges posed by low lighting, perceptual change, and low-cost cameras. Our primary contribution is a novel two-step algorithm that combines fast low-resolution whole image matching with a higher-resolution patch-verification step, as well as image saliency methods that simultaneously improve performance and decrease computing time. The algorithms are demonstrated using consumer cameras mounted on a small vehicle in a mixed urban and vegetated environment and a car traversing highway and suburban streets, at different times of day and night and in various weather conditions. The algorithms achieve reliable mapping over the course of a day, both when incrementally incorporating new visual scenes from different times of day into an existing map, and when using a static map comprising visual scenes captured at only one point in time. Using the two-step place-recognition process, we demonstrate for the first time single-image, error-free place recognition at recall rates above 50% across a day-night dataset without prior training or utilization of image sequences. This place-recognition performance enables topologically correct mapping across day-night cycles
Efficient visual coding and the predictability of eye movements on natural movies
We deal with the analysis of eye movements made on natural movies in free-viewing conditions. Saccades are detected and used to label two classes of movie patches as attended and non-attended. Machine learning tech-niques are then used to determine how well the two classes can be separated, i.e. how predictable saccade targets are. Although very simple saliency mea-sures are used and then averaged to obtain just one average value per scale, the two classes can be separated with an ROC score of around 0.7, which is higher than previously reported results. Moreover, predictability is analysed for different representations to obtain indirect evidence for the likelihood of a particular representation. It is shown that the predictability correlates with the local intrinsic dimension in a movie
MRCNet: Crowd counting and density map estimation in aerial and ground imagery
In spite of the many advantages of aerial imagery for crowd monitoring and management at mass events, datasets of aerial images of crowds are still lacking in the field. As a remedy, in this work we introduce a novel crowd dataset, the DLR Aerial Crowd Dataset (DLR-ACD), which is composed of 33 large aerial images acquired from 16 flight campaigns over mass events with 226,291 persons annotated. To the best of our knowledge, DLR-ACD is the first aerial crowd dataset and will be released publicly. To tackle the problem of accurate crowd counting and density map estimation in aerial images of crowds, this work also proposes a new encoder-decoder convolutional neural network, the so-called Multi-Resolution Crowd Network (MRCNet). The encoder is based on the VGG-16 network and the decoder is composed of a set of bilinear upsampling and convolutional layers. Using two losses, one at an earlier level and another at the last level of the decoder, MRCNet estimates crowd counts and high-resolution crowd density maps as two different but interrelated tasks. In addition, MRCNet utilizes contextual and detailed local information by combining high- and low-level features through a number of lateral connections inspired by the Feature Pyramid Network (FPN) technique. We evaluated MRCNet on the proposed DLR-ACD dataset as well as on the ShanghaiTech dataset, a CCTV-based crowd counting benchmark. The results demonstrate that MRCNet outperforms the state-of-the-art crowd counting methods in estimating the crowd counts and density maps for both aerial and CCTV-based Images
Aerial Road Segmentation in the Presence of Topological Label Noise
The availability of large-scale annotated datasets has enabled Fully-Convolutional Neural Networks to reach outstanding performance on road extraction in aerial images. However, high-quality pixel-level annotation is expensive to produce and even manually labeled data often contains topological errors. Trading off quality for quantity, many datasets rely on already available yet noisy labels, for example from OpenStreetMap.
In this paper, we explore the training of custom U-Nets built with ResNet and DenseNet backbones using noise-aware losses that are robust towards label omission and registration noise. We perform an extensive evaluation of standard and noise-aware losses, including a novel Bootstrapped DICE-Coefficient loss, on two challenging road segmentation benchmarks. Our losses yield a consistent improvement in overall extraction quality and exhibit a strong capacity to cope with severe label noise. Our method generalizes well to two other fine-grained topology delineation tasks: surface crack detection for quality inspection and cell membrane extraction in electron microscopy imagery